Geospatial features have proven over the years to significantly increase user engagement. In fact, we got so used to it that now we even expect that any kind of recommendation (news, restaurants, products) should also be location-based. The majority of the applications nowadays are mainly using Radius-based queries, which might naturally lead to imprecise or sub-optimal results. In a world where companies are competing for the user’s attention, precision might be a key differentiating factor, that is why polygon-based search is becoming increasingly popular.
In this article, we will talk about Polygon-based search in Couchbase, a feature that has been added since Couchbase 6.6
What is Geospatial Polygon Search
A geospatial query specifies an area, and returns each document that contains a reference to a location within the area. Areas and locations are represented by means of latitude–longitude coordinate pairs.
There are many flavors of geospatial query, Radius-based queries (also known as Point Distance) are the most commonly used and also the easiest one to start with. All you need is a coordinate and the radius of the circle:
1 |
GeoDistanceQuery geoDistanceQuery = SearchQuery.geoDistance( 37.379403,-121.967463, "10000"); |
In the bounding box search, you have to specify two latitude-longitude coordinate pairs instead. These are respectively taken to indicate the top left and bottom right corners of a rectangle. Documents are returned if they reference a location within the area of the rectangle:
1 2 |
//fake coordinates GeoBoundingBoxQuery query = SearchQuery.geoBoundingBox( 37.379403,-121.967463, 37.379403,-121.967463); |
The two methods above are great when you need approximated results, but they are not enough when you need to limit your search area :
Geospatial Geometry/Polygon search allows you to look for documents contained inside a closed polygon-shaped object defined by a sequence of coordinates. In the image above, for instance, we want to limit our search by only documents with coordinates that live inside the polygon we have defined, which represents a building in this case.
There is virtually no limit on the number of coordinates/points that could be specified in your polygon query, but like any other search engine out there, the performance will naturally degrade if you need to specify very complex polygons.
Geospatial polygon-based use case
Polygon based search (also referred to as Geometry search) is not restricted to people analyzing satellite images, there are many common use cases that could benefit from it:
- Real State: Find Offices/Houses in a given village or in a specific area
- Gaming: Spawn of items in specific areas (e.g. different types of pokemon appearing according to the terrain on Pokemon Go)
- Analytics: How many people passed by a specific region (e.g. How many Uber/Lyft drivers have been in the airport in a given day)
- Advertisement: Increase the CPC (Cost Per Click) when users are in a specific place (e.g. a Shopping Mall)
- Smart Cities: Notify citizens of a region about a potential threat (eg: hailstorm, flooding)
Geospatial Search Growth
We can indirectly demonstrate how popular location-based search has become by comparing the percentage of apps requesting permission to access the user’s location over the years:
According to the graph above, back in 2013 only 11.68% of the mobile apps were requesting access to the user’s location. This other report suggests that in 2014, nearly 24% of the apps were requesting for the user’s location:
source: https://www.statista.com/statistics/486440/leading-google-play-app-permissions/
If we fast forward to 2020, this third report already suggests that 95% of the apps in China are requesting access to the user’s location:
Creating Geospatial indexes
For this demo, you will need this small dataset of earthquakes in the US. You can quickly load it on Couchbase by creating a bucket called earthquakes, and then click on Documents -> Import Documents, select the earthquake.json file, and then click on Import Data:
Now let’s create our geo FTS index. First, go to the Search tab and click on “Add Index”. Then specify the following configuration:
-
- Name: earthquake_idx
- Bucket: earthquake
- Type Mappings:
- Uncheck default
- Add a new type mapping called earthquake
- Insert a child field:
- Field: geo
- Type: geopoint
- Searchable as: geo
- Insert a child field:
Once the Index Progress reaches 100% we are ready to make our first polygon search.
Polygon / Geometric Search in Action
The search has two main requirements: Fields with coordinates must be indexed using the geopoints type (like we did in the previous session), and the coordinates should form a closed polygon, which means that the first and the last coordinates must be the same. Here is an example of a valid polygon:
1 2 |
[[-103.230791,37.0258202],[-108.4746292,43.1130542],[-116.2949697,44.9554792],[-123.7047084,41.8493514],[-122.8710938,38.7540833],[-120.0585938,34.6693585],[-117.9492188,34.0162419], [-115.1367188,32.694866],[-109.8632813,27.6056708],[-104.2382813,19.5597901],[-97.5585938,16.8045411],[-100.0195313,23.0797318],[-102.5664232,29.0188937],[-103.230791,37.0258202]] |
If we plot these coordinates in a map, this is what we will end up with:
Source: https://www.keene.edu/campus/maps/tool/
Supported Coordinates Formats
The following formats for polygon coordinates are accepted:
- Single Array: [ “lat, lon”, “lat, lon”, “lat, lon”, …]
- Multiple Arrays: [ [ lon, lat], [ lon, lat], … ]
- GeoJson: [ { “lat”: 1, “lon”: 1}, { “lat”: 1, “lon”: 1}, … ]
- Geohash: [ “9q8zjbkp”, “9q8yvvdh”, “9q8yyp1e” ]
Geospatial Search using the REST API
Apart from using the native SDKs, you can also use the Full-Text Search REST API to make geospatial queries using the following format:
1 |
curl -XPOST -H "Content-Type: application/json" -u username:password http://ip_address:8094/api/index/index_name/query -d '{ "fields": [Fields_you_want_to_return],"query": {"field": "target_geo_field","polygon_points": [...]}}' |
Here is a real example using our dataset and index:
1 |
curl -XPOST -H "Content-Type: application/json" -u Administrator:password http://localhost:8094/api/index/earthquake_idx/query -d '{ "fields": ["Region"],"query": {"field": "geo","polygon_points": [[-103.230791,37.0258202],[-108.4746292,43.1130542],[-116.2949697,44.9554792], [-123.7047084,41.8493514],[-122.8710938,38.7540833],[-120.0585938,34.6693585],[-117.9492188,34.0162419], [-115.1367188,32.694866],[-109.8632813,27.6056708],[-104.2382813,19.5597901],[-97.5585938,16.8045411],[-100.0195313,23.0797318],[-102.5664232,29.0188937],[-103.230791,37.0258202]]}}' |
and here is the output of the command above:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
{ "status":{ "total":1, "failed":0, "successful":1 }, "request":{ "query":{ "polygon_points":[ { "lon":-103.230791, "lat":37.0258202 }, { "lon":-108.4746292, "lat":43.1130542 }, { "lon":-116.2949697, "lat":44.9554792 }, { "lon":-123.7047084, "lat":41.8493514 }, { "lon":-122.8710938, "lat":38.7540833 }, { "lon":-120.0585938, "lat":34.6693585 }, { "lon":-117.9492188, "lat":34.0162419 }, { "lon":-115.1367188, "lat":32.694866 }, { "lon":-109.8632813, "lat":27.6056708 }, { "lon":-104.2382813, "lat":19.5597901 }, { "lon":-97.5585938, "lat":16.8045411 }, { "lon":-100.0195313, "lat":23.0797318 }, { "lon":-102.5664232, "lat":29.0188937 }, { "lon":-103.230791, "lat":37.0258202 } ], "field":"geo" }, "size":10, "from":0, "highlight":null, "fields":[ "Region" ], "facets":null, "explain":false, "sort":[ "-_score" ], "includeLocations":false, "search_after":null, "search_before":null }, "hits":[ { "index":"earthquake_idx_61e3e23c5ecf99e1_acbbef99", "id":"8c70e570-5a6d-4f75-b409-f1bffd0417b5", "score":0.0018446927534334827, "sort":[ "_score" ] }, { "index":"earthquake_idx_61e3e23c5ecf99e1_acbbef99", "id":"8440a36e-fdb9-432e-b54b-13fbc1d482b4", "score":0.0018446927534334827, "sort":[ "_score" ] }, { "index":"earthquake_idx_61e3e23c5ecf99e1_acbbef99", "id":"c953d351-1811-4a2c-a2ba-8d0c94c9ffea", "score":0.0018446927534334827, "sort":[ "_score" ] }, { "index":"earthquake_idx_61e3e23c5ecf99e1_acbbef99", "id":"19fdf2ec-f53e-47ab-8bd3-70c1e6654c99", "score":0.0018446927534334827, "sort":[ "_score" ] }, { "index":"earthquake_idx_61e3e23c5ecf99e1_acbbef99", "id":"f95c8ec2-f58f-45f2-8c32-d7569e5d9b6b", "score":0.0018446927534334827, "sort":[ "_score" ] }, { "index":"earthquake_idx_61e3e23c5ecf99e1_acbbef99", "id":"9029d546-322f-4a2e-9919-fb85d2d17b42", "score":0.0018446927534334827, "sort":[ "_score" ] }, { "index":"earthquake_idx_61e3e23c5ecf99e1_acbbef99", "id":"5e0ae7e5-f825-44c6-b7d1-d398be7db081", "score":0.0018446927534334827, "sort":[ "_score" ] }, { "index":"earthquake_idx_61e3e23c5ecf99e1_acbbef99", "id":"1370a1f7-5f72-4c91-afcd-4ddef67e8a34", "score":0.0018446927534334827, "sort":[ "_score" ] }, { "index":"earthquake_idx_61e3e23c5ecf99e1_acbbef99", "id":"d9e4e8d9-6fbe-470b-b510-e688a1ccf1d3", "score":0.0018446927534334827, "sort":[ "_score" ] }, { "index":"earthquake_idx_61e3e23c5ecf99e1_acbbef99", "id":"b43ec13e-997d-45c7-954b-0433d4d6fa9e", "score":0.0018446927534334827, "sort":[ "_score" ] } ], "total_hits":353, "max_score":0.0018446927534334827, "took":118981353, "facets":null } |
Note that you need to specify the coordinates using the attribute polygon_points. You can also expand this query to filter by other attributes of the document (e.g: Region, Magnitude, etc)
Geospatial Ring / Donut Shaped Search
You can also specify one or more holes in your polygon in case you want to filter some specific areas out:
You can achieve something like the image above by using boolean queries:
1 2 3 4 5 6 7 8 9 10 11 12 |
{ ... "query": { "must": { "conjuncts": [{"field":"geo", "polygon_points":{outer_polygon_coordinates}}] }, "must_not": { //single our multiple disjunctions "disjuncts": [{"field":"geo", "polygon_points":{inner_polygon_coordinates}}] }, } } |
In summary, you simply have to specify your polygon coordinates inside the “must” block and your holes in the “must_not“.
I highly recommend you to always use Disjunctions while specifying your holes, although you could still use a Conjunction if you have a single hole, using conjunction for multiple holes will potentially not filter your data properly (unless you have documents with coordinates inside both holes). If you have no idea of what I’m talking about, check out this documentation on compound queries.
Further Reads
If you are interested in Geo Search, I highly recommend you to read the official documentation. In case you are new to Full-Text Search, check out this video showing how to create a Netflix-like search using FTS.
We also have a series of articles on Couchbase’s blog talking about important aspects of Full-Text Search:
- What is Geospatial Data?
- Why you should avoid LIKE %
- What Is Fuzzy Matching and How to Use It Correctly
- Building a Shazam-like app to understand how Tokenizers and Filters work
- Running FTS Queries in N1QL
- Text Analysis within a Full-Text Search Engine